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Abstract. Let £, be a random integer vector, having uniform distribution 

P{£,= (ii,i 2 ,...,in) = Vn n } for 1 < ^ ,i 2 , . . . ,1^ < n. 

A realization (ii , i.2, . . . ,i n ) of £, is called good, if its elements are dif- 
ferent. We present algorithms Linear, Backward, Forward, Tree, 
Garbage, Bucket which decide whether a given realization is good. 
We analyse the number of comparisons and running time of these algo- 
rithms using simulation gathering data on all possible inputs for small 
values of n and generating random inputs for large values of n. 

1 Introduction 

Let £, be a random integer vector, having uniform distribution 



for 1 < i] , i-2, . . . , t n < u. 

A realization (ii,t2, ■ ■ ■ ,i-n) of £, is called good, if its elements are different. 
We present six algorithms which decide whether a given realization is good. 

This problem arises in connection with the design of agricultural [4, 5, 57, 72] 
and industrial [34] experiments, with the testing of Latin [1, 9, 22, 23, 27, 32, 
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53, 54, 63, 64] and sudoku [3, 4, 6, 12, 13, 14, 15, 16, 17, 20, 21, 22, 26, 29, 
30, 31, 41, 42, 44, 46, 47, 51, 55, 59, 61, 64, 66, 67, 68, 69, 70, 72, 74] squares, 
with genetic sequences and arrays [2, 7, 8, 18, 24, 28, 35, 36, 37, 38, 45, 48, 
49, 50, 56, 65, 71, 73, 75], with sociology [25], and also with the analysis of the 
performance of computers with interleaved memory [11, 33, 39, 40, 41, 43, 52]. 

Section 2 contains the pseudocodes of the investigated algorithms. In Section 
3 the results of the simulation experiments and the basic theoretical results 
are presented. Section 4 contains the summary of the paper. 

Further simulation results are contained in [62]. The proofs of the lemmas 
and theorems can be found in [43]. 

2 Pseudocodes of the algorithms 

This section contains the pseudocodes of the investigated algorithms Linear, 
Backward, Forward, Tree, Garbage, and Bucket. The psudocode con- 
ventions described in the book [19] written by Cormen, Leiserson, Rivest, and 
Stein are used. 

The inputs of the following six algorithms are n (the length of the sequence 
s) and s = (si , S2, . . . , s n ), a sequence of nonnegative integers with < < u 
for 1 < t < n) in all cases. The output is always a logical variable g (its value 
is True, if the input sequence is good, and False otherwise). 

The working variables are usually the cycle variables i and j. 

2.1 Definition of algorithm Linear 

Linear writes zero into the elements of an n length vector v = (v-|, V2, 
v n ), then investigates the elements of the realization and if v[st] > 
(signalising a repetition), then stops, otherwise adds 1 to v[s[i]]. 

Linear(u, s) 

01 g <- True 

02 for i <- 1 to n 

03 do v[i] <- 

04 for i <— 1 to n 

05 do if v[s[i]] > 

06 then g <— False 

07 return g 

08 else v[s[i]] <- v[s[i]] + 1 

09 return g 
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2.2 Definition of algorithm Backward 

Backward compares the second (12), third (i.3), . . . , last (i n ) element of the 
realization s with the previous elements until the first collision or until the 
last pair of elements. 

Backward (n, s) 

01 g <- True 

02 for i <- 2 to u 

03 do for j <— i — 1 downto 1 

04 do if s[i] = s[j] 

05 then g <— False 

06 return g 

07 return g 

2.3 Definition of algorithm Forward 

Forward compares the first (si ), second (S2), ■ • • , last but one (s n _i ) element 
of the realization with the following elements until the first collision or until 
the last pair of elements. 

Forward (n, s) 

01 g <- True 

02 for i <- 1 to u - 1 

03 do for j <— i + 1 to u 

04 do if s[i] = s[j] 

05 then g <— False 

06 return g 

07 return g 

2.4 Definition of algorithm Tree 

Tree builds a random search tree from the elements of the realization and 
finishes the construction of the tree if it finds the following element of the 
realization in the tree (then the realization is not good) or it tested the last 
element too without a collision (then the realization is good). 

Tree(u, s) 

01 g <- True 

02 let s[l] be the root of a tree 

03 for i <- 2 to u 
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04 
05 
06 
07 



if [s[i] is in the tree 
then g <— False 

return 
else insert s[i] in the tree 



08 return g 

2.5 Definition of algorithm Garbage 

This algorithm is similar to Linear, but it works without the setting zeros 
into the elements of a vector requiring linear amount of time. 

Beside the cycle variable i Garbage uses as working variable also a vector 
v = (vi,V2, . . . ,v n ). Interesting is that v is used without initialisation, that is 
its initial values can be arbitrary integer numbers. 

The algorithm GARBAGE was proposed by Gabor Monostori [58]. 

Garbage (tl, s) 

01 g <- True 

02 for i <- 1 to n 

03 do if v[s[i]] < i and s[v[s[i]]] = s[i] 

04 then g <— False 

05 return g 

06 else v[s[i]] <- i 

07 return g 

2.6 Definition of algorithm Bucket 

Bucket handles the array Q[l : m, 1 : m] (where m = \y/u\ and puts the 
element S\ into the rth row of Q, where r = [s^/m] and it tests using linear 
search whether Sj appeared earlier in the corresponding row. The elements of 
the vector c = [c-\,Cz, ... , c m ) are counters, where Cj (1 < j < m) shows the 
number of elements of the ith row. 

For the simplicity we suppose that n is a square. 

Bucket(u, s) 

01 g <- True 

02 m <- y/ri 

03 for j <- 1 to m 

04 do c[j] <- 1 

05 for i <- 1 to u 

06 dorf- [s[i]/m]m 
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07 
08 
09 
10 
11 
12 



for j <— 1 to c[r] — 1 

do if S[i] = Q[T,j] 

then g <— False 
return g 

else Q[r, c[r]] <— s[i] 
c[r] <- c[t] + 1 



13 return g 

3 Analysis of the algorithms 

3.1 Analysis of algorithm Linear 

The first algorithm is Linear. It writes zero into the elements of an n length 
vector v = (vi, V2, . . . , v n ), then investigates the elements of the realization 
sequentially and if ij = k, then adds 1 to and tests whether > signalizing 
a repetition. 

In best case Linear executes only two comparisons, but the initialization of 
the vector v requires 0(n) assignments. It is called Linear, since its running 
time is 0(u) in best, worst and so also in expected case. 

Theorem 1 The expected number C ex p(n, Linear) = Cj_ of comparisons of 
Linear is 



tends monotonically decreasing to zero when n tends to infinity. n!/n n also 
tends monotonically decreasing to zero, but their difference 6(n) = k(tl) — 
n!/n n is increasing for 1 < u < 8 and is decreasing for n > 8. 

Theorem 2 The expected running time T exp (n, Linear) = T]_ of Linear is 





where 




k=1 



T L = n + Vlmx + - + 25 (n) , 
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n 


C L 


+ 2/3 


n!/n n 


K(n) 


6(n) 


1 


1 .000000 


1.919981 


1 .000000 


0.080019 


-0.919981 


2 


2.000000 


2.439121 


0.500000 


0.060879 


-0.439121 


3 


2.666667 


2.837470 


0.222222 


0.051418 


-0.170804 


4 


3.125000 


3.173295 


0.093750 


0.045455 


-0.048295 


5 


3.472000 


3.469162 


0.038400 


0.041238 


+0.002838 


6 


3.759259 


3.736647 


0.015432 


0.038045 


+0.022612 


7 


4.012019 


3.982624 


0.006120 


0.035515 


+0.029395 


8 


4.242615 


4.211574 


0.002403 


0.033444 


+0.031040 


9 


4.457379 


4.426609 


0.000937 


0.031707 


+0.030770 


10 


4.659853 


4.629994 


0.000363 


0.030222 


+0.029859 



Table 1: Values of C L , + 2/3, n!/n n , k(ti), and 6(n) = k(tl) -nl/n 

for n= 1, 2, 10 



where 

6(n) = k(u) - 4 
toncfe to zero when n tends to infinity, further 

5(n + 1 ) > 6(n) /or 1 < n < 7 and 6(n + 1 ) < 6(n) /or n > 8. 
Table 1 shows some concrete values connected with algorithm Linear. 

3.2 Analysis of algorithm Backward 

The second algorithm is Backward. This algorithm is a naive comparison- 
based one. Backward compares the second (i-2); third (13), • . . , last (in) 
element of the realization with the previous elements until the first repetition 
or until the last pair of elements. 

The running time of Backward is constant in the best case, but it is 
quadratic in the worst case. 

Theorem 3 The expected number C eX p(u, Backward) = Cb of comparisons 
of the algorithm BACKWARD is 

_ Inn 2 . 

C B =n+W — + -- ot[n), 

where a(u) = k(tl)/2+ (n!/n n )((n + l)/2) monotonically decreasing tends to 
zero when n tends to 00. 
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Table 2 shows some concrete values characterizing algorithm Backward. 



n 


C B 


n - ^Ttn/8 + 2/3 


(n!/n*)((n + l)/2) 


K(n) 


a(n) 


1 


0.000000 


1.040010 


1 .000000 


0.080019 


1.040010 


2 


1 .000000 


1 .780440 


0.750000 


0.060879 


0.780440 


3 


2.111111 


2.581265 


0.444444 


0.051418 


0.470154 


4 


3.156250 


3.413353 


0.234375 


0.045455 


0.257103 


5 


4.129600 


4.265419 


0.115200 


0.041238 


0.135819 


6 


5.058642 


5.131677 


0.054012 


0.038045 


0, 073035 


7 


5.966451 


6.008688 


0.024480 


0.035515 


0.042237 


8 


6.866676 


6.894213 


0.010815 


0.033444 


0.027536 


9 


7J66A59 


7.786695 


0.004683 


0.031707 


0.020537 


10 


8.667896 


8.685003 


0.001996 


0.030222 


0.017107 



Table 2: Values of C B , n- ^nn/8+2/3, (n!/u n )((u+1)/2), k(tl), and <x(n) = 
k(ti)/2+ (n!/u n )((u + 1)/2) forn = l, 2, 10 



The next assertion gives the expected running time of algorithm Back- 
ward. 

Theorem 4 The expected running time T ex p(n, Backward) = Tb of the al- 
gorithm Backward is 




where <x(n) = k(tl)/2 + (n!/n n )((Ti + 1)/2) monotonically decreasing tends to 
zero when n tends to oo. 



3.3 Analysis of algorithm Forward 

Forward compares the first (si ), second (S2), . . . , last but one (s n _i ) element 
of the realization with the next elements until the first collision or until the 
last pair of elements. 

Taking into account the number of the necessary comparisons in line 04 of 
Forward, we get C best (n, Forward) = 1 = 0(1 ), and C WOTSt (n, Forward) = 
B(n,2) =9(n 2 ). 

The next assertion gives the expected running time. 
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Theorem 5 The expected running time T ex p(n, Forward) = Tp of the algo- 
rithm Forward is 

T F = n + e(v^). (1) 



Although the basic characteristics of Forward and Backward are iden- 
tical, as Table 3 shows, there is a small difference in the expected behaviour. 



TL 


number of sequences 


number of good sequences 


C F 


Cw 


2 


4 


2 


1 .000000 


1.000000 


3 


27 


6 


2.111111 


2.111111 


4 


256 


24 


3.203125 


3.156250 


5 


3125 


120 


4.264000 


4.126960 


6 


46656 


720 


5.342341 


5.058642 


7 


823543 


5 040 


6.326760 


5.966451 


8 


16777216 


40320 


7.342926 


6.866676 


9 


387420489 


362880 


8.354165 


7.766^ 



Table 3: Values of n, the number of possible input sequences, number of good 
sequences, expected number of comparisons of Forward (C f ) and expected 
number of comparisons of Backward (Cw) f° r n = 2, 3, . . . , 9 



3.4 Analysis of algorithm Tree 

Tree builds a random search tree from the elements of the realization and 
finishes the construction of the tree if it finds the following element of the 
realization in the tree (then the realization is not good) or it tested the last 
element too without a collision (then the realization is good). 

The worst case running time of Tree appears when the input contains 
different elements in increasing or decreasing order. Then the result is 0(n 2 ). 
The best case is when the first two elements of s are equal, so Cbest^ Tree) = 
1 =8(1). 

Using the known fact that the expected height of a random search tree is 
8(lgn) we can get that the order of the expected running time is ^/nlogn. 

Theorem 6 The expected running time Tj of Tree is 



T T = 0(v^lgn). 



(2) 
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TL 


number of good inputs 


number of comparisons 


number of assignments 


1 


100000.000000 


0.000000 


1 .000000 


2 


49 946.000000 


1 .000000 


1 .499460 


3 


22243.000000 


2.038960 


1 .889900 


4 


9 396.000000 


2.921710 


2.219390 


5 


3 723.000000 


3.682710 


2.511409 


6 


1 569.000000 


4.352690 


2.773160 


7 


620.000000 


4.985280 


3.021820 


8 


251 .000000 


5.590900 


3.252989 


9 


104 


6.148550 


3.459510 


10 


33 


6.704350 


3.663749 


11 


17 


7.271570 


3.860450 


12 


3 


7.779950 


4.039530 


13 


3 


8.314370 


4.214370 


14 





8.824660 


4.384480 


15 


2 


9.302720 


4.537880 


16 





9.840690 


4.716760 


17 





10.287560 


4.853530 


18 





10.719770 


4.989370 


19 





1 1 .242740 


5.147560 


20 





1 1 .689660 


5.279180 



Table 4: Values of n, number of good inputs, number of comparisons, number 
of assignments of Tree for n = 1 , 2, . . . , 10 



Table 4 shows some results of the simulation experiments (the number of 
random input sequences is 100 000 in all cases). 

Using the method of the smallest squares to find the parameters of the 
formula ay / nlog2n we received the following approximation formula for the 
expected number of comparisons: 

C exp (n, Tree) = 1 .245754^ log 2 n - 0.273588. 
3.5 Analysis of algorithm Garbage 

This algorithm is similar to Linear, but it works without the setting zeros 
into the elements of a vector requiring linear amount of time. 

Beside the cycle variable i Garbage uses as working variable also a vector 
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v = (vi,V2, . . . ,v n ). Interesting is that v is used without initialisation, that is 
its initial values can be arbitrary integer numbers. 

The worst case running time of Garbage appears when the input con- 
tains different elements and the garbage in the memory does not help, but 
even in this case C W0TS t(ri, Garbage) = 0(n). The best case is when the 
first element is repeated in the input and the garbage helps to find a repe- 
tition of the firs element of the input. Taking into account this case we get 
Cbest(TT-, Garbage) = 0(1 ). 

According to the next assertion the expected running time is ®(y/n). 

Lemma 7 The expected running time of Garbage is 

Texpin, Garbage) = 8(Vn). (3) 



3.6 Analysis of algorithm Bucket 

Algorithm Bucket divides the interval [1,n] into m = \\/tl\ subintervals 
Ii , I2, . . . , I m > where Ik = [(k — 1 )m + 1 , km)], and assigns a bucket Bk to 
interval Ik- Bucket sequentially puts the input elements ij into the corre- 
sponding bucket: if ij belongs to the interval Ik then it checks whether ij is 
contained in Bk or not. Bucket works up to the first repetition. (For the 
simplicity we suppose that n = m 2 .) 

In best case Bucket executes only 1 comparison, but the initialization of 
the buckets requires ©(y/ri) assignments, therefore the best running time is 
also y/ri. The worst case appears when the input is a permutation. Then each 
bucket requires 0(n) comparisons, so the worst running time is @{ny/ri). 

Lemma 8 Let bj (j = 1 , 2, . . . , m) be a random variable characterising the 
number of elements in the bucket Bj at the moment of the first repetition. Then 

£tbj}=y|-^u) 

for j = 1 , 2, . . . , m, where 

, , k(ti) 



3y/ri y/n ' 

and |x(n) tends monotonically decreasing to zero when n tends to infinity. 



Table 5 contains some concrete values connected with E{bi}. 
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n 


E{b!} 


y/n/2 


V(3x/H) 


K[n)/y/n 


H(n) 


1 


1 .000000 


1.253314 


0.333333 


0.080019 


0.253314 


2 


1 .060660 


1.253314 


0.235702 


0.043048 


0.192654 


3 


1 .090055 


1.253314 


0.192450 


0.029686 


0.162764 


4 


1.109375 


1.253314 


0A66667 


0.022727 


0.143940 


5 


1.122685 


1.253314 


0.149071 


0.018442 


0.130629 


6 


1.132763 


1.253314 


0.136083 


0.015532 


0.120551 


7 


1.147287 


1.253314 


0.125988 


0.013423 


0.112565 


8 


1.147287 


1.253314 


0.117851 


0.011824 


0.106027 


9 


1.152772 


1.253314 


0.111111 


0.010569 


0.100542 


10 


1.157462 


1.253314 


0.105409 


0.009557 


0.095852 



Table 5: Values of E{bi}, y/n/2, V(3\/n), K(n)/Vn, and u(n) = 1/[3y/n) - 
K[n)/y/n of Bucket for n = 1, 2, 10 



Lemma 9 Let f n 6e a random variable characterising the number of compar- 
isons executed in connection with the first repeated element. Then 



where 



TUnJ 



1 , /7t K(n) 



V^ + 2 

and T) (n) tends monotonically decreasing to zero when n tends to infinity. 

Theorem 10 The expected number C ex p(n, Bucket) = Cb of comparisons 
of algorithm Bucket in 1 bucket is 

1 fn 

C B = Vn+ - + p(n), 

u>/iere 



5/6- 79^/8 -3k(u)/2 
P(n) = ^ 

VTL+ 1 

tends to zero when n and tends to infinity. 
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Index and Algorithm 


CbestW 


Cworst(^) 


Cexp (n) 


1. Linear 


0(1) 


O(n) 




2. Backward 


0(1) 


G(^) 


G(n) 


3. Forward 


0(1) 


G(n^) 


G(n) 


4. Tree 


0(1) 


0[n z ) 




5. Garbage 


0(1) 


G(n) 




6. Bucket 




0(n\/n) 





Table 6: The number of necessary comparisons of the investigated algorithms 
in best, worst and expected cases 

Theorem 11 The expected running time Tb(tl, Bucket) = Tb of Bucket is 

T " = ( 3 WD 

where 



4>(n) = 3k(tl) — p(n) — 3r)(n) 

and (t)(n) tends to zero when n tencfe to infinity. 

It is worth to remark that simulation experiments of B. Novak [62] show 
that the expected running time of Garbage is a few percent better, then the 
expected running time of Bucket. 

4 Summary 

Table 6 contains the number of necessary comparisons in best, worst and 
expected cases for all investigated algorithms. 

Table 7 contains the running time in best, worst and expected cases for all 
investigated algorithms. 
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v^+A/^ + 4>(n), 



n! 3^/8- 1/3 -3k(u)/2 



n" 



^/n + ^ 



Testing of sequences by simulation 



147 



Index and Algorithm 


TbestW 


T W oTst(n) 


T exp (u) 


1. Linear 


O(n) 


6(n) 


n + 6(VH) 


2. Backward 


6(1) 


9(tO 


6(n) 


3. Forward 


6(1) 


6(n^) 


6(n) 


5. Tree 


6(1) 


6(n^) 


6(VHlgn) 


6. Garbage 


6(1) 


6(n) 


6(VH) 


7. Bucket 




6(n^H) 


6(VH) 



Table 7: The running times of the investigated algorithms in best, worst and 
expected cases 
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